How does the first water shell fold proteins so fast ? 
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First shells of hydration and bulk solvent plays a crucial role in the folding of proteins. Here, the 
role of water in the dynamics of proteins has been investigated using a theoretical protein-solvent 
model and a statistical physics approach. We formulate a hydration model where the hydrogen 
bonds between water molecules pertaining to the first shell of the protein conformation may be either 
mainly formed or broken. At thermal equilibrium, hydrogen bonds are formed at low temperature 
and are broken at high temperature. To explore the solvent effect, we follow the folding of a 
large sampling of protein chains, using a master-equation evolution. The dynamics shows a clear 
mechanism. Above the glass-transition temperature, a large ratio of chains fold very rapidly into 
the native structure irrespective of the temperature, following pathways of high transition rates 
through structures surrounded by the solvent with broken hydrogen bonds. Although these states 
have an infinitesimal probability, they act as strong dynamical attractors and fast folding proceeds 
along these routes rather than pathways with small transition rates between configurations of much 
higher equilibrium probabilities. At a given low temperature, a broad jump in the folding times 
is observed. Below this glass temperature, the pathways where hydrogen bonds are mainly formed 
become those of highest rates although with conformational changes of huge relaxation times. The 
present results reveal that folding obeys a double-funnel mechanism. 

PACS numbers: 87.14.et 



To this date, the three-stranded /3 sheet is the faster 
folder finding its native structure in the amazingly short 
time of 140 nano-seconds [1]. The protein folding prob- 
lem is still considered as one of the major unsolved prob- 
lems of science [2] and the answer to the Levinthal ques- 
tion [3] "how a protein can fold so fast ?" remains a 
"grand challenge" [4]. 

Protein folding is the process whereby a protein folds 
into its native structure. The slowest folding proteins 
may require a few minutes due among other factors to 
proline isomerization [5], They fold passing through 
many intermediate states. On the other hand, many 
small single-domain proteins fold very rapidly over time 
scales of a few microseconds [6, 7]. For many of these 
proteins, the folding process is a single exponential func- 
tion of time [5, 6] and is modeled by a two-state mass 
action model and an Arrhenius diagram on which the 
free energy of some ensembles of chain conformations is 
plotted as a function of reaction coordinate, usually not 
known. This diagram exhibits a transition state between 
the unfolded and the native states [8]. 

Moreover, some ultra fast folders exhibit more com- 
plex kinetics with non- Arrhenius behavior [10] (i.e. non- 
linear dependence of the logarithm of the folding rate on 
the inverse of the temperature). Some results show that 
the activation energy is positive at room temperature, 
decreases as the temperature increases and may become 
negative at high temperature [11]. It has been suggested 
that this could arise from the temperature dependence of 
the hydrophobic effect [12, 13]. 

An alternative to this transition-state view is the con- 
cept of folding funnel [14]. This energy-landscape pic- 
ture is based on the idea of minimal frustration [15], 
which states that the evolutionary mechanism has re- 



tained those protein sequences that have a funnel-like 
energy landscape. In that concept, the height of the fun- 
nel represents the conformational energy and its width 
represents the entropy of the subset of chain conforma- 
tions of a given energy [16-20]. The top of the funnel is 
populated by the huge number of denaturated configu- 
rations with a large energy and entropy and the bottom 
with the unique native structure of very low energy and 
quasi-nil entropy. Each protein chains folds from the top 
of the funnel towards the bottom. 

The transition-state theory and the folding-funnel pic- 
ture are two different approaches. The first one describes 
well the two-state kinetics, but does not explain why fold- 
ing is so fast. The second one explains well why folding is 
so fast, and the thermodynamic free energy barrier, that 
gives rise to two-state kinetics and makes transition-state 
theory applicable to the folding process, is essentially of 
cntropic nature. 

Lattice models of proteins are among the favorite tools 
for the theoretical study of folding. The microscopic rep- 
resentation of the proteins is simplified to allow large 
sampling of the configurational space. Proteins are mod- 
eled as self-avoiding-walk chains of beads, which are lo- 
cated on a two or three-dimensional square or cubic lat- 
tice. For small-length chains, a full enumeration of the 
conformations allows the exact calculation of the parti- 
tion function, and, thus of statistical averages [21-24]. 
The Hamiltonian of a given conformation of the chain 
results from the interaction of the first neighbors of the 
beads on the lattice, but not in the sequence. The more 
popular model of couplings between monomers are known 
as the G6[22], the HP [21, 25-27] and the random-energy 
model(REM) [28-33]. In the Go model, the interaction 
between the beads of a given compact structure, cho- 
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sen as the native conformation, is set to -1 and all the 
other couplings to 0. In the HP model, the sequence 
of the protein is given in terms of a scries of hydropho- 
bic (H) or polar (P) residues and the coupling between 
two hydrophobic beads is set to -1 and that involving at 
least one polar monomers to 0. In the REM, a matrix of 
couplings between all pairs of residues is constructed by 
drawing random numbers from a Gaussian distribution. 

Despite the numerous results obtained with such mod- 
els, they fail to reproduce a fundamental feature of a 
protein : its cold denaturation. This mechanism is as- 
sociated to the loss of stability of the native structure 
upon cooling down the system [34-36]. For half a cen- 
tury, it has been well known that water plays a crucial 
role in the mechanism of folding [37-39] and an under- 
standing of cold denaturation requires a finer model of 
the solvent than the temperature-independent attractive 
parameters used in Go, HP or REM. Some recent refine- 
ments of the models, including temperature dependence 
of the hydrophobic effect, have allowed to model the cold 
transition [24, 40-42] to be described. 

As the physics of protein folding proceeds in water and 
the interactions of the protein chemical groups are solvent 
mediated, the representation of the solvent is of great im- 
portance [43] . The hydrophobic effect is an active field of 
research by itself [44-48]. It represents the tendency of 
water to exclude nonpolar solutes. It results from a dis- 
ruption of the network of hydrogen bonds between water 
molecules caused by the transfer of an nonpolar solute 
into water. The energy variation of this process is fa- 
vorable at room temperature, whereas the entropy cost 
leads to a large positive free energy of transfer. In ad- 
dition, the physics of the solvent-mediated interactions 
of the protein may be captured by studying the interac- 
tion of two nonpolar solutes immersed in water [43] . This 
prototypical interaction can be handled by averaging e.g. 
the degrees of freedom of the solvent in the free-energy 
function of two methane solutes in explicit water [49, 50]. 
The profile of the free energy as a function of the distance 
between the two methane solutes shows a deep minimum 
for the contact distance between the two molecules and 
another one, less pronounced when they are separated by 
a distance slightly smaller than one solvent molecule di- 
ameter. A maximum, higher than the free energy of the 
two isolated solutes arises between these two configura- 
tions. This maximum is known as the desolvation bar- 
rier. As the temperature increases, the contact between 
the nonpolar solutes becomes more favorable as the des- 
olvation barrier is reduced. This barrier tends to favor a 
high thermodynamic cooperativity of the model in con- 
trast with a model without desolvation barrier [51, 52]. It 
has also been shown that the physics behind this barrier 
is responsible for the large diversity in the folding rates, 
similar to what is observed experimentally [53] 

Moreover, recent experimental work has shown that 
structural fluctuations of the solvent may control struc- 
tural fluctuations of the protein [54-58] . It has also been 
observed that motion of hydration water drives protein 



dynamics [59] . This could be responsible for the protein- 
solvent dynamical transition connected with the liquid 
glass transition of hydration water [60]. These results 
show the importance of the degrees of freedom of the hy- 
dration first shell for the dynamics of the proteins. A 
theoretical model of the hydrophobic effect introduced 
by Muller [61] and extended by Lee and Graziano [62] al- 
lows to separate the contribution of the hydrogen bonds 
of the first shell and that of the bulk water. The basic 
idea stems from the result that the hydration of nonpolar 
solutes presents a large entropy cost and a small favorable 
energy. The hydrogen bond breakage in the bulk is con- 
sidered as a two-state equilibrium between the formed 
and the broken hydrogen bonds. The equilibrium con- 
stant between the two states is related to the fraction of 
formed hydrogen bonds and to the difference in enthalpy 
and the ratio of degeneracy resulting from the breaking 
of one hydrogen bond. A similar description is used for 
to the water molecules of the first shell. It considers that 
the thermodynamics of a broken hydrogen bond in the 
first shell is the same than that in the bulk and gives a 
picture of the hydrophobic effect based on the enthalpy 
gain and entropy cost arising from the creation of a bond 
in both situations. A little bit later, Lee and Graziano 
[62] pointed out that the energy associated to a broken 
hydrogen bond is not the same for a water molecule of 
the first shell and for one of the bulk. The presence of an 
nonpolar solute induces the breakage of a hydrogen bond 
of the first shell, leading to a more unfavorable energy 
than the same event in the bulk. The two-state model of 
the hydrophobic effect has been applied [63] to the two- 
dimensional Mercedes-Benz model of water [64, 65] . As a 
result, they give a spectrum where the non-degenerated 
ground state is for the formed hydrogen bond in the first 
shell and the highly degenerated states for the broken 
hydrogen bond in the first shell corresponds to the larger 
value of the spectrum. The energies and the degenera- 
cies of the formed or broken bonds in the bulk are found 
between the two previously described. 

In this paper, this picture has been reduced further by 
gathering together the two close energy levels associated 
to the broken and formed hydrogen bonds of bulk water 
[40] and has been applied to a lattice model of protein to 
study the effect of the first shell on the protein dynamics. 
Aside from the hydrophobic model itself, how the solvent 
is simulated has a significant impact on the energy land- 
scape of the protein. Explicit solvent models are very 
computationally expensive [66]. Implicit solvent models 
have been developed to take into account the solvent as 
a mean field effect [67-73]. Yet, results obtained from 
explicit simulations do not always agree with those from 
implicit models [74, 75]. Up until now, the strategy to 
follow the kinetics of the proteins consisted in averaging 
the degrees of freedom of the solvent by calculating the 
free energy of solvation of each protein structure and the 
transition rates between two protein conformations. The 
system evolves along effective routes made of conforma- 
tions surrounded by an averaged solvent. 
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Here, as the solvent model allows it, we have calculated 
the rate between two protein-solvent microscopic configu- 
rations and grouped together some equivalent transitions. 
The physical pathways are microscopic routes in the pro- 
tein and the solvent configurational space, not "mean" 
routes in the conformational space of the protein sur- 
rounded by an effective solvent. 

The dynamics of a large set of chains in the solvent 
is calculated using a master equation evolution. In the 
spirit of the concept of folding funnel [14, 16, 17, 19], a 
picture of the folding in terms of two surfaces, depend- 
ing on the state of the hydrogen bonds of the first shell 
solvent, drawn in an entropy-energy plot, is given. The 
mechanisms responsible for the fast folding and the glass 
transition are detailed in the body of the paper. In the 
first part, the protein and the solvent models are de- 
scribed. In the second part, the equations of the evolu- 
tion are established. In the third part, the mechanism, 
which occurs during the fast overcoming of kinetic barri- 
ers is explained, then the mechanism responsible for the 
glass transition is revealed. 

I. MODEL. 

The microscopic Hamiltonian of the chain in conforma- 
tion to, surrounded by a first shell of solvent molecules in 
configuration (3 and bulk solvent molecules in structure 
a is denoted : 

<T/mic 'l/ c h i -i/shell _■_ o/bulk 

The first term results from the intrachain interactions, 
the second one from the contribution of the molecules of 
the first shell solvent in interaction with the protein and 
the last one from that of the bulk water. 

A. Protein Model 

The proteins are represented as self-avoiding walk 
strings of ./V monomer beads located on a two- 
dimensional lattice [21, 23, 33] (here TV = 12). This 
length of the chain is short enough to allow analytical 
calculations for the dynamics and long enough to give 
interesting results. The compactness of a structure m is 
the number of intrachain contacts of the chain confor- 
mation to ; Cm — J2i>j+3 ^ij 1 ^ wnere A^™* 1 = 1 if the 
monomers i and j arc first neighbors on the lattice and 
otherwise. The accessible surface area of the conforma- 
tion to to the solvent is defined as the number of links 
between the chain beads and the empty sites of the lat- 
tice : A m = 2N + 2 — 2C m . The intrachain Hamiltonian 
of the peptide structure to is : 

i>j+3 

To model the heterogeneity of the sequence of amino- 
acids of the chain, the couplings Bij between monomers 



i and j are drawn at random from a Gaussian distribution 
centered on —2 with standard deviation equal to 1 [29]. 
Such a way of designing the sequences leads to create a 
configurational space with small energy gaps between the 
structures of bottom of the energy spectrum. A particu- 
lar compact (native) structure does not emerge as a sta- 
ble conformation of the sequence with a large energy gap 
with other compact conformations. To increase the sta- 
bility of the native conformation of the sequence[76, 77], 
we select a compact conformation for the native structure 
of the sequence and we rank the couplings such that the 
minimum ones are associated with the native contacts. 



B. Solvent Model 

For each chain conformation, the empty nodes of the 
lattice models the solvent effect[24, 40]. We do not at- 
tempt to introduce a fine description of the structural 
properties of solvent around proteins itself, but we de- 
scribe a realistic solvent effect on the weights of the chain 
conformations. 

The bulk water contribution is simply modelled by an 
extensive negative free energy term which guides chains 
towards compact structures. The microscopic structures 
of the first shell solvent around any given chain conforma- 
tion are separated into two groups depending on whether 
most of the hydrogen bonds are formed or not. Hence, 
each protein conformation has two possible values for its 
energy depending on the structure of the first shell: one 
for a ground state (GS) associated to a rather organized 
first shell and another one for an excited state (ES) with 
mainly broken hydrogen bonds. 

For each chain conformation m, the first shell interac- 
tion is extensive with respect to A m . The links between 
a solvent node and the chain beads account for the first 
shell contribution. The non-degenerated ground state 
denoted by f3 — models the first shell water molecules 
with formed hydrogen bonds around the protein. It is 
taken as the energy reference which equals 0. The excited 
states, corresponding to /? > 1, are for the g^ 71 first shell 
structures with broken hydrogen bonds. Their energy is 
A m £sh- The Hamiltonian of the first shell solvent for the 
chain structure to is written : 

Utf = A M a(P) with{^: lifl ^^ lm 

When one intrachain contact is formed, two monomers- 
solvent bonds are broken. Then, after removing the con- 
stant term, the pure solvent contribution is a simple func- 
tion of 2C m [24, 42]. The factor of 2 guarantees that the 
solvent volume does not depend on the chain structure. 
For each chain structure to, the <?^ m -fold degenerated 
Hamiltonian of the pure solvent is independent of the 
bulk micro-state : 

U b ^ = 2C m e bk for 1 < a < g™™ 
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FIG. 1: The solvent around a chain conformation chosen at 
random. The unique solvent configuration where the hydro- 
gen bonds between water molecules and between water and 
the chain are formed (shown in the left picture) is the non- 
degenerated ground state (GS) of the first shell and the other 
structures (one is shown in the right picture), where the hy- 
drogen bonds are mostly broken, are grouped together in the 
highly degenerated excited state (ES). Only, the highly orga- 
nized and highly disordered solvent configuration are taken 
into account. All the other cases are not considered in this 
two-state picture which only takes into account the lowest 
energy and largest entropy macro-states which are the most 
important contribution for the statistical physics approach. 



The results given in the paper hold while the parame- 
ters are ranked as follow : < £bk < £sh and <?bk < 9 sh- 
in the spirit of the results obtained by Silverstein et 
a/[63], the values of the solvent parameters are ranked 
as follow £bk = 0.2, e s h = 0.6, <j>bk = 2 and g s h = 3. 
The reported results in this paper are quite robust with 
regards to change the parameters holding the above rank- 
ing equation. However, for some technical reasons (dis- 
cussed below) if <?bk or/and g s h are chosen too large, the 
computational times of the calculation becomes too pro- 
hibitive. 

The energy and the degeneracy of the ground and ex- 
cited macro-state of each peptide conformation are : 
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The native conformation is the structure of largest value 
of P r ^ q determined by a full enumeration of the conforma- 
tional space of the chain at low temperature. For the set 
of couplings Bij and solvent parameters used here, the 
native conformation is a compact structure of intrachain 
energy -9.895. The folding transition temperature is de- 
fined as, the melting temperature T m of the experimen- 
tal literature[78, 79] at which the equilibrium probability 
of the native structure equals that of all the other de- 
natured conformations One specific chain structure (the 
native conformation) has a larger equilibrium probabil- 
ity to occur than all the other conformations for T < T m 
or in others words P^ t (T m ) = 0.5 (here T m = 0.90). 
Figure 2 shows that the equilibrium probability of occur- 
rence of this native structure (Nat) surrounded by water 
with formed hydrogen bonds reaches one at low temper- 
ature. Other solvent configurations around Nat become 
relevant for T > T (here T = 0.45). Last, the prob- 
ability of occurence of the native structure with of the 
first shell solvent in ES equals that in GS at a specific 
temperature denoted T* (here T* = 0.54). 

We note in passing, as the solvent parameter of the GS 
of the first shell is lower than that of the bulk, it could be 
possible to observe cold denaturation of the chains, if the 
GS of the extended chains would be the state of lowest 
energy among the whole configurational space [24, 42]. 
Here, however, the parameterization chosen avoids this 
possibility in order to study only the folding mechanism. 

Figure 3 shows the conformational distribution as func- 
tion of the number of native intrachain contacts calcu- 
lated as follow : 



F T {Q) = -T\n}^S(Q-Q, 



,)^ W exp(-?ir;/T) 



-TV mac n_f 

_ oA„ 
9ma — .9 s h 



ch 



2C m e b k + crA 



2C„ 
Sbk 



with a = or 1 



Results for the chain in interaction with the 
solvent 



The thermal equilibrium probability of each macro- 
state is given by : V^ a = g ma exp(~n™ c /T)/Z(T) and 
that of a chain structure is = J2a=o ^ma- Here the 
Boltzmann constant is set to 1 and the partition function 



where <5(0 = 1 and otherwise and Q m is the number 
of native contact of the chain structure to. Under dena- 
tured conditions (T > T m ), the free energy profile show 
a barrier between the native and non-native conditions. 
The sets of conformation with one or two intrachain con- 
tacts, surrounded by solvent in ES (see fig. 4) have the 
largest probability of occurrence. At the melting temper- 
ature, the same barrier still separates the equiprobable 
native and non- native populations. Under strong native 
condition (low temperature), the shape of the free energy 
profile is similar to that observe for a downhill folding. 

Indeed, as the temperature decreases, the more prob- 
able non-native population shifts from the sets with few 
native contacts to that with the maximal native contacts 
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conformation m is 



FIG. 2: Equilibrium probabilities of occurrence of the native 
structure (black solid line) as functions of the temperature 
(given in arbitrary units) with the contribution of the ground 
state (red long dashed line) and of the excited state (green 
dashed line). At a temperature To, the probability that the 
hydrogen bonds are broken becomes significant and at T*, 
the probability to observe hydrogen bonds around the native 
conformation is the same as that to observe water with broken 
hydrogen bonds. The probability of occurrence of the native 
structure equals 1/2 at T m . 
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FIG. 3: Free energy profile for the lattice model at the three 
temperatures indicated. P(Q) is the probability of occurrence 
over Q at thermal equilibrium. 
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These curves as function of the temperature only depends 
on the chain exposure to the solvent, i.e. on the compact- 
ness (fig. 4). They exhibit a maximum at the same tem- 
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FIG. 4: The heat capacity of the solvent around the chain 
structures depends on the compactness of the conformation. 
At low temperature, the chain are surrounded by rigid cages 
of water molecules. The higher the compactness, the higher 
is the temperature of occurrence of the chain with broken 
hydrogen bonds of the solvent. 

perature: T* = 0.54. It is the temperature of equiproba- 
bility of occurrence of the two phases: broken and formed 
hydrogen bonds of the first shell around the peptide. For 
To < T < T* , the solvent configurations with a water 
cage around the peptides is preferred to the broken hy- 
drogen bonds. For T < To, it is the only relevant state. 
Above T*, the solvent occurs in the excited macro-state. 
This is in good agreement with the result obtained for 
the thermodynamics of the chains presented above. 



II. TIME EVOLUTION. 



D. Results for the solvent at equilibrium 

At thermal equilibrium with a bath at temperature T, 
the probability of occurrence of the conformation m with 
the solvent in micro-state (a/3) tend towards : p^ a p = 

exp {-UZlp/T) /Z(T). 

The mean energy of the first shell solvent around chain 



To explore all the possible routes from the non-native 
structures to the native conformation, the probabilities 
of each micro-state, composed of one protein structure 
in interaction with one solvent configuration, evolve us- 
ing a continuous time Markov process applied to a large 
sampling of peptides. 

Master equation approach to protein folding has al- 
ready be used in lattice model with an effective solvent 
[80]. It is shown in appendix A, that a master equation of 
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the macro-states may be deduced from the master equa- 
tion of the micro-states. In a finite time approach, the 
probabilities of the macro-states evolve following the Eu- 
ler algorithm [81] : 

V™ c (t + St) = v™(t) + ^ E E W^TO (*) 

m' <t' 

where 

v {0) , 

Y , , — a mm (1 4- exr>((l-l mac - W nl ? c , ) /T)^ 1 
for m m' or a ^ a 1 and 

As explained in appendix A, V^ n , = 1 if structures m 
and m' are connected by a one monomer move (either 
a corner flip or a tail move) or if m = m! . The charac- 
teristic time associated to a chain move (t™ 1 ^, = r c if 
m 7^ m!) and to a solvent move (t™ 1 ^, = r s if m = m!) 
are set to r c = 1 and r s = 0.001 <C r c . 
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FIG. 5: Logarithm of the waiting time t p to observe the native 
structure with a probability equal to p plotted as a function 
of the inverse of the temperature for p — 0.02 to p = 0.20 by 
steps of 0.02. 

A sufficient condition to conserve the norm of the prob- 
ability vector (i.e. J2 m <r 'P-mcfit) — 1) is satisfied by fixing 
St = 1/ max m[ ,{y mffimff }. As the values of g m cf and then 
those of Yma.m'a' may be huge, the value of St is tiny. 
Then, the evolution of the probability vector is very slow. 

The simulations of folding start with the initial con- 
dition : V™ c (0) = l/2N con{ where 7V conf = 15019 is 
the number of chain structures. The probability of oc- 
currence of conformation m after time t is P m (t) = 
'PmjoW + T'miTW- The waiting time to observe the na- 
tive structure with a probability p is noted t p . The main 
results of the calculation are summarized in fig. 5 and 



6. They exhibit some findings on the out of equilibrium 
folding of a large sampling of chains at different temper- 
atures. 

Figure 5 shows a non-Arrhcnius behavior of the model. 
Because of the tiny value of St, the early events of the 
folding are shown, here. The remaining part of this plot 
will be deduced from the results given below. It will 
be shown that even if the curves may be well fitted by a 
Vogel-Fulcher-Tamman function (t p (T) oc exp(—E a /(T— 
To))), which is the signature of an a-relaxation with a 
dynamical temperature To = Tq(j>), the remaining part 
of the curves exhibits a more complex shape which can 
not be capture by effective solvent models. 




FIG. 6: Evolution of the probabilities of occurrence of native 
structure (back solid line), the contribution of excited state 
(green dashed line) and of the ground state (red long dashed 
line) as functions of the time at different temperatures. For 
temperature of 0.10 and 0.20, some chains topologically close 
to the native structure reach it very fast because no activation 
barrier occurs in their pathway. Then, the probability of the 
ES decreases monotonically to zero and that of the native 
structure reach a plateau and the kinetics is frozen. As the 
number of chain structures of the basin do not depend on 
the temperature, the plots have very similar shapes and the 
values of the probabilities are very close to each other showing 
that a local phenomenon is taking place. For temperature 
of 0.30 and 0.40, the ES of the native conformation (which 
have a null equilibrium probability) acts as an attractor, in 
the early events, and reach very rapidly a maximum of its 
probability of occurrence, much larger than its equilibrium 
probability. A fast transition takes place towards the GS of 
the native structure. Then, a slower dynamic regime guides 
the other chains towards the native structure. Now, the values 
of the probabilities depend on the temperatures showing that 
a global mechanism dominates. 

Figure 6 shows that, even if its equilibrium probability 
is infinitesimal, the excited state of the solvent acts as a 
strong dynamical attractor above a particular tempera- 
ture to be discussed later. Below this temperature the 
kinetics is the same whatever the temperature. 
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III. RELAXATION TIMES OF THE MOVES. 

Only two connected macro-states (ma) and (mV) are 
considered for a while. The equations 6 and 7 show that 
the rate of the transition (mV) — > (ma) depends on 
the degeneracy of the macro-state (ma) and not of that 
of (m'a'). In other words, the increase of the prob- 
ability of (ma) and the decrease of that of (m'a 1 ) is 
due to the degeneracy of (ma) and to the difference of 
energy between the states with (m'a 1 ). After thermal 
equilibration, the probability of (ma) would go towards 

= V^J(V^ a + V2 al ) The solution of the system 
of equations 6, for only two states is : 



TP, 



TS, 



Tp 2 



= + [TWO) - P&] exp(-t/r 



mer.m' u' 



with a very small relaxation time : 

T rno y m' a' (y^irifT.m' cr' "T" Y ma m ' a ') T~ rn _ m / (2) 

which now depends on the energies and degeneracies of 
the macro-states. While numerous chains initially in 
state (ma) move into state (m'a'), some of them may 
go back to (ma). As a consequence, the relaxation times 
depend on forward and backward rates of a move. As 
an aside, if the move from (ma) to (m'a') were allowed 
and not the backward transition, then the relaxation time 
would simplify to l/Y mc r,m'o>- 

As the degeneracy of the excited state is always larger 
than that of the ground state, the value of the rates be- 
tween two excited states of the chain is larger than that 
between two ground states (if the energy difference is of 
the same order) As a consequence, the relaxation times 
of the former transitions are always very small at not too 
low temperature. They are supposed to simulate peptide 
evolving in a fluid solvent. In contrast, the relaxation 
times of the latter transitions are larger. This models 
chains evolving in a viscous medium. Thus, the dynamics 
of the chains surrounded by water with broken hydrogen 
bonds is faster than that of low degenerated structures 
in interaction with solvent with formed hydrogen bonds. 

In addition, the more extended the structure, the 
larger the degeneracy of the excited states and the 
smaller is the relaxation time. This is because a more 
extended chain has a larger exposure to the solvent and 
thus a lot of hydrogen bonds are broken in the first shell 
and as a consequence the dynamics is faster. 

Last the connexion between the excited and ground 
states of the same chain conformation (m = m!) have a 
small relaxation time because t s <C t c . The difference 
between that case and the fluid connexion depends on 
the ratio t c /t s . 

IV. FAST FOLDING MECHANISM 

To understand the mechanism underlying the fast fold- 
ing, a possible pathway leading to the native structure, 
shown in fig. 7, is considered as a first approach of the 
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FIG. 7: Top: A possible pathway with five chain structures 
connected by one monomer moves ending in Nat taken as an 
example to capture the role play by the solvent in the fold- 
ing kinetics. The energy of the GS is given above of each 
conformation. A single route from Tpi to Nat is considered, 
to simplify the study of this first approach, and the reaction 
coordinate is simply a function (not given) of the number of 
conformational changes necessary to reach Nat. Bottom: the 
waiting times to observe a ratio p of proteins in Nat, starting 
from an equiprobability of all the states, show a glass transi- 
tion at temperatures depending on p (shown in the inset). 



problem. This pathway consists in five chain structures 
connected by one monomer moves. In contrast with the 
whole conformational space where there exist a great 
number of routes from a given conformation to the na- 
tive structure, in this section we first consider a sim- 
plified trajectory where there is a unique pathway (via 
TSi, Tp2 and TS2) from Tpi to Nat. Structures Tpi 
and Tp2 have five intrachain contacts and thus, smaller 
energy than TSi and TS2 which have three intrachain 
contacts. Then, the GS of the structures Tpi and Tp 2 
should act as kinetics traps and that of TSi and TS2 
as transition states towards the native conformation. At 
t = 0, the initial probability of each macro-state is set to 
P(t = 0) = 0.1. An Euler algorithm is applied to simu- 
late the evolution of the probabilities of the ten macro- 
states of this subsystem composed by the ES and the GS 
of this five chain structures. The waiting time t p to ob- 
serve the native structure with a probability equal to p 
is plotted for p = 0.30 to p = 0.65 by steps of 0.05 as 
function of the temperature. At t — 0, the probability of 
occurrence of Nat is already 0.20 and for p < 0.30, the 
waiting time is tiny since a many chains in the TS2 states 
fold instantaneously in Nat. For T < 0.20, the waiting 
time to reach a ratio p = 0.30 becomes constant, but that 
to observe a larger ratio become huge. 
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For p > 0.30, the waiting times increase continuously 
as the temperature is decreased until a temperature, de- 
pending on p and denoted by T g (p), where a broad dy- 
namical transition occurs. The kinetics is fluid above T g 
and glassy below. 

Figure 8 shows the very early events of the folding 
of this small system. The ES of Nat relax into the GS 
via a, very fast, solvent transition. At a temperature 
independent time, to, the probability of the ES of Nat 
becomes very small. 
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FIG. 8: . Evolution of the probabilities of the native structure 
(black solid lines) and the ES (green dashed lines) and the GS 
(red long dashed lines) contributions as functions of the time 
for different temperatures. 

At low temperature (T = 0.10 or T = 0.20), half of 
the chains in conformation TS2 fold in Nat and the other 
half goes to Tp2 between the time to and a tempera- 
ture dependent time, t* after which the probability of 
occurrence of the native structure becomes constant for 
a while. Then, a long plateau with p w 0.3 occurs where 
the dynamics is glassy. 

At higher temperature, the probability of occurrence 
of Nat at to becomes higher and the length of the plateau 
smaller. Many chains in the Tpi, TSi or Tp2 jump to 
Nat. At T = 0.40, the probability of the ES becomes 
very high (« 0.3) at a time denoted by tu hi the early 
events and the probability of Nat equals 0.8 very fast. 
Between tM and t* , the solvent of the chains in ES relax 
in GS. Here, the ES of Nat acts as a strong dynamical 
attractor. 

The instantaneous flux from (mV) to (ma), given 

by km' a' — Mn<7 ^ma^m' er'^m' a' (^) ^m' a' .maimer (t) Can 

also be calculated. Figure 9 shows that, at T = 0.10, the 
kinetics is only guided by the difference of energy of the 
micro-states. 

The solvated chains have a very large probability to 
move towards a micro-state of lower energy. Between the 
initial time and to, the ES of Nat relax very fast to the 
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FIG. 9: The direction and magnitudes of the flux of the con- 
nexion of the pathway shown in fig. 7 at to and T = 0.10. The 
larger the flux, the wider is the line. Connections with tiny 
flux are not drawn. 
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FIG. 10: Same than fig.9 at T = 0.40. Flux in the GS pathway 
is the same for both simulations. 



GS of Nat. During this period, the ES of all structures 
relax to the GS. Between the times to and t* , TSi and 
TS 2 relax to Tpi, Tp 2 and Nat. The probabilities of GS 
of Tpi, Tp 2 and Nat increase until all other state have 
a infinitesimal probability and the probability of the Nat 
increases to one half of that of TS 2 - Then, all the fluxes 
become tiny and the dynamics is frozen. 

At T = 0.40, the equilibrium probabilities of the ES 
are infinitesimal. Then, we could also expect a kinetic 
which would only pass through the ground states, as for 
T = 0.10. Instead of this, the sampling of chains finds 
another strategy to overcome the barrier very fast. At 
time to, the probability of the GS of Nat is already of 
0.6. The flux between the pairs of states, drawn in fig. 10, 
show that the chains in GS of Tpi do not fold via the 
GS of the other structures of the pathway. First, they 
reach the ES of Tpi and then they pass through the ES 
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of the other structures and lastly they relax to the GS of 
Nat. This is a consequence of the larger transition rates 
between the ES than between the GS. In addition, the 
probabilities of the ES remain very small. 

The chains follow a pathway with large transition rates 
between improbable states for which the in going flux 
equals the out going keeping their probabilities small. 

At time to, the probability of Nat is around 0.7. The 
probabilities of the ES are quasi-null. The flux via the 
ES pathway is the same as via the GS pathway. The 
kinetics become very slow and then a plateau occurs in 
the curves of the probability of Nat as a function of the 
time shown previously. 



V. HOW WATER LUBRICATES OR FREEZES 
THE FOLDING. A PHYSICAL PICTURE. 

Curves of figs. 6 and 8 are not exactly the same but 
are similar. The basic mechanisms found for the direct 
fold in the case of the five-chain conformations pathway 
may be extended to the whole configurational space of 
the protein. Put together, the results of this paper allow 
us to give the picture of the folding given in fig.ll. To 
draw one of the envelop in the energy-entropy plot, the 
entropy, noted S&(E) for a = 0, associated to a given mi- 
croscopic energy, is calculated from the number of chain 
and solvent configurations, with formed hydrogen bonds. 
The same calculation, of the entropy noted S a (E) for 
a = 1, is done for the solvent without hydrogen bonds. 
The two functions S a (E) are related to the total num- 
ber of protein-solvent configurations whose total energy 
matches the energy values E: 



o/mic \i 



m a, ft 



where 8^ e \x) — 1 if — e/2 < x < e/2 and otherwise. 
They may be written as functions of the degeneracy of 
the macro-states: 

S a {E)=HY J 9raa 8& (E - H™)] 
m 

Moreover a parameter 9 m allows one to distinguish 
between the structures of the bottom of the configura- 
tional valleys, which may be kinetics traps in the folding. 
One defines 9 m = 1 for the macro-states, only connected 
to macro-states of higher energies and otherwise. Al- 
though the native conformation satisfies to this defini- 
tion, it can not be considered as a trap and the value of 
#Nat is set to 0. Considering the five structures of fig. 7, 
9 and 10 as an example, one has 9t Pi = 9t P2 = 1 an d 
#TSi = ^ts 2 = ^Nat = 0. Thus, the envelop of the trap 
region in the energy-entropy plot is a subset of the GS 
barrel, given by : 

S Tp (E) = ln^T e m9m , 8&(E - ?Cf c )] 
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FIG. 11: Top: At T = 0.10, the chains in any energy level 
of the ES-funnel relax very fast in the GS-barrel. Then, they 
move down slowly in the barrel, they reach the trap zone and 
the dynamics is frozen. Only a tiny fraction of the chains, 
synthesized in chain structures very close to Nat, reach it in a 
reasonable time. Bottom: The picture of the folding at higher 
temperature but below To where the ES of the chains still 
has an infinitesimal equilibrium probability, depicts a different 
mechanism. At T = 0.40, the chains in the GS-barrel move to 
the ES-funnel. Then, most of them fold very fast towards the 
bottom of the ES-funnel (in the ES of the native structure) 
and they relax in the bottom of the GS barrel (in the GS of the 
native structure). They get round the trap zone by passing in 
the ES-funnel where the transition rates are very high. A few 
chains relax in the GS-barrel during their descent in the ES- 
funnel and then they have slow dynamics. The temperature 
where the folding mechanism switches from one scenario to 
the other one is the glass temperature of the system. 



where g m - t o and %™ a o are for the degeneracy and energy 
of the GS of the structure to. 

Figure 11 shows the three surfaces drawn on the same 
plot. The GS-barrel (respectively ES-funnel) is associ- 
ated to the GS (respectively ES) of the chain structures. 
A part of the surface of the GS-barrel is populated with 
the trap conformations We have already shown, in this 
paper, that the connections between two chain structures 
in the ES-funnel have small relaxation times and that in 
the GS-barrel long relaxation times. This results from 
the following mechanism. A single protein and its solvent 
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is in a given configuration, at a given time, which belongs 
to a corresponding macro-state, and not in all the con- 
figurations of a the macro-state. As a consequence, the 
transition rates between one protein-solvent configura- 
tion of the macro-state (ma) and those of a macro-state 
(m'cr') depends on the energy difference between these 
states and also on the degeneracy of (m'cr') but not on 
the degeneracy of (ma). 

In particular, the connexion from a given configuration 
of the GS to those of the ES of the same chain confor- 
mation involves a huge number of routes which increase 
the energy On the opposite, a few connexions allow the 
backward transition which decreases the energy. 

At "very low" temperature, the proteins and solvent fol- 
low pathways which minimize the microscopic energy. 
Then, they very slowly evolve, along the few routes of the 
GS barrel and fall rapidly in some trap conformations. 

At "not too low" temperature, proteins and solvent fol- 
low pathways where the number of routes is maximized. 
Then, they quickly evolve along the vast possibilities of 
routes of the ES funnel without traps and reach very fast 
the native structure. 

As it has previously been shown, the temperature of 
glass transition which separates the two mechanisms is 
not clearly defined because it depends on the ratio of 
folding proteins 

In addition, we mention that, the calculation of the 
partition function is independent on the informations 
concerning the network of connexions. As a consequence, 
the glass temperature is not related to the temperatures 
(T , T* and T m ) defined above. This explains why the 
excited states of the first shell lubricate the folding un- 
der conditions where only the ground states have non-nil 
equilibrium probabilities and why the folding is frozen at 
lower temperature. 



VI. CONCLUSION. 

We have shown that a model of protein-solvent which 
takes into account the difference of degeneracies of the 
bulk solvent and the first shell solvent with mainly 
formed or mainly broken hydrogen bonds permits an un- 
derstanding of the mechanism which may lead to quasi- 
instantaneous folding of a sufficiently significant ratio of 
the proteins in solution. Figure 11, shows that two dis- 
tinct folding mechanisms exist. In the first one, the fold- 
ing times are very large and in the second one very short. 
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Appendix A. 

The probability of occurrence of the conformation m 
and the solvent in micro-state (a, [3) at time t is denoted 
by p™ 1 ^ (t). The master equation of the system is written 



dp: 



mic 

maf 



At 



E * 



maf3;m'a' 0' P m 'a' 



(3) 



where X ma p. m ' a 'p> — X(m'a'f3' — > ma/3) is the tran- 
sition rate from configuration (m'a'[3 r ) to (ma(3). The 
diagonal terms which take into account of the transition 
from (ma/3) to the other configurations, are : 



X. 



map:mat 



E 

(m' a' /3')^(ma/: 



Xm'n'fl' 



The detailed balance conditions, 

^-maj3;m'a' 0'P m 'a' f3' -^-m' a' fi' ;mal3P ma 

allow to write the rate of transition as : 



(4) 



X 



v: 



(0) 



where V, 



ma/3;m'a' 0' 



(0) 



— a T (n mal3 , n m , a ,p,) (i>) 



K1°L = 1 if the two structures are 

I I VI III III III 

connected by a corner flip and tail moves [82] (see 
fig. 12) whatever the solvent configurations and other- 
wise. The acceptance function is ar (x; x') = [l+cxp((a;— 
X ')/T)]~ 1 and r™ 1 ^, is a symmetric function : r™ 1 ^, = t c 
if m 7^ m' and r™ 1 ^, = t s if m ~ m' . 

The probability of occurrence of the macro-state (ma) 
at time t denoted 

k™ = E«*) 

where 6(0) = 1 and S(n) = if n ^ and the following 
relations : 

E & = E E pEwV - ^')) 

ql' fi' a' ol' fi' 

E-rjmac 
' ma' 



^S(a - a((3)) = g ma 

a/3 

will be used below. 

Now, we rewrite the master equation 3 as : 



dp 



E^f St*-*™ 



a/3 



J2 E S(a - a(f3))X ma 

fim'a'fi' Pm'a'3' 

a/3 m'a'fi' 
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yields : 



dp 



£ - °w = ££*(— £ £ 



- (J (/3'))*a T (^^;^™; c Q '^) iC^' 



The evolution equation is rewritten : 



FIG. 12: Two types of moves, with different characteristic 
times, are considered in the dynamics: the protein monomer 
move (dashed arrows) or the solvent configuration transi- 
tion (other arrows). For this last event, the connection be- 
tween solvent configurations of the same macroscopic level 
do not affect the kinetics. The solvent transition is sup- 
posed to have smaller relaxation time than the monomer 
move. The relaxation times of the connection are defined 
as follows: we solve eqs.3 and 5 for an isolated connexion 
between two states (ma) and (m'a 1 ) leading to: p™^(t) = 



„(°°) 



Pma , + bw(0) - exp(-i/<^)- < 



is chosen as 



t c if the connexion is between two different protein structures 
and is chosen as t 3 if only water moves (r s <C t c ). The set of 
one monomer moves considered here, is the corner flip (shown 
in this figure) and the tail move (not shown). 



with 



AV mac (t) 

u/ ma W _ \ \ V T> 



mac 
m' a' 



(6) 



Y 



V {0) , 

mm' 



mcT.m'tT' 



"TtCj^mV') ( 7 ) 



In addition, we mention that this result also satis- 
fies the following balance equation: Y ma -, m ' r7 rV°^, a , — 

1 m' a' ,m(X > rricr ' 
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